Supersaturated plans for variable selection in large databases

نویسندگان

  • C. Parpoula
  • C. Koukouvinos
  • S. Stylianou
  • Paulo Canas Rodrigues
  • S. STYLIANOU
چکیده

Over the last decades, the collection and storage of data has become massive with the advance of technology and variable selection has become a fundamental tool to large dimensional statistical modelling problems. In this study we implement data mining techniques, metaheuristics and use experimental designs in databases in order to determine the most relevant variables for classification in regression problems in cases where observations and labels of a large database are available. We propose a database-driven scheme for the encryption of specific fields of a database in order to select an optimal supersaturated design consisting of the variables of a large database which have been found to influence significantly the response outcome. The proposed design selection approach is quite promising, since we are able to retrieve an optimal supersaturated plan using a very small percentage of the available runs, a fact that makes the statistical analysis of a large database computationally feasible and affordable.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis Methods for Supersaturated Design: Some Comparisons

Supersaturated designs are very cost-effective with respect to the number of runs and as such are highly desirable in many preliminary studies in industrial experimentation. Variable selection plays an important role in analyzing data from the supersaturated designs. Traditional approaches, such as the best subset variable selection and stepwise regression, may not be appropriate in this situat...

متن کامل

An Overview of the New Feature Selection Methods in Finite Mixture of Regression Models

Variable (feature) selection has attracted much attention in contemporary statistical learning and recent scientific research. This is mainly due to the rapid advancement in modern technology that allows scientists to collect data of unprecedented size and complexity. One type of statistical problem in such applications is concerned with modeling an output variable as a function of a sma...

متن کامل

Analysis of Supersaturated Designs via the Dantzig Selector

A supersaturated design is a design whose run size is not enough for estimating all the main effects. It is commonly used in screening experiments, where the goals are to identify sparse and dominant active factors with low cost. In this paper, we study a variable selection method via the Dantzig selector, proposed by Candes and Tao (2007), to screen important effects. A graphical procedure and...

متن کامل

Analysis of Supersaturated Designs via Dantzig Selector

A supersaturated design is a design whose run size is not enough for estimating all the main effects. It is commonly used in screening experiment, where the goal is to identify sparse and dominant active effects with low cost. In this paper, we study a variable selection method via Dantzig selector, proposed by Candes and Tao (2007), to screen active effects. A graphical procedure and an automa...

متن کامل

Trait adaptation promotes species coexistence in diverse predator and prey communities

Species can adjust their traits in response to selection which may strongly influence species coexistence. Nevertheless, current theory mainly assumes distinct and time-invariant trait values. We examined the combined effects of the range and the speed of trait adaptation on species coexistence using an innovative multispecies predator-prey model. It allows for temporal trait changes of all pre...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014